Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar

نویسندگان

  • Andreas van Cranenburgh
  • Remko Scha
  • Federico Sangati
چکیده

Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both DOP and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discontinuous Data-Oriented Parsing through Mild Context-Sensitivity

It has long been argued that incorporating a notion of discontinuity in phrase-structure is desirable, given phenomena such as topicalization and extraposition, and particular features of languages such as cross-serial dependencies in Dutch and the German Mittelfeld. Up until recently this was mainly a theoretical topic, but advances in parsing technology have made treebank parsing with discont...

متن کامل

Rich Statistical Parsing and Literary Language

This thesisapplies the Data-Oriented Parsing framework in two areas:parsing & literature. The data-oriented approach rests on the assumptionthat re-use of chunks of training data can be detected and exploited attest time. Syntactic tree fragments form the common thread in the thesis.Chapter 2 presents a method to efficiently extract them from treebanks,based on heuristic...

متن کامل

Discontinuous Parsing with an Efficient and Accurate DOP Model

We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We g...

متن کامل

Discontinuity and Non-Projectivity: Using Mildly Context-Sensitive Formalisms for Data-Driven Parsing

We present a parser for probabilistic Linear Context-Free Rewriting Systems and use it for constituency and dependency treebank parsing. The choice of LCFRS, a formalism with an extended domain of locality, enables us to model discontinuous constituents and non-projective dependencies in a straightforward way. The parsing results show that, firstly, our parser is efficient enough to be used for...

متن کامل

Polynomial Pregroup Grammars parse Context Sensitive Languages

Pregroup grammars with a possibly infinite number of lexical entries are polynomial if the length of type assignments for sentences is a polynomial in the number of words. Polynomial pregroup grammars are shown to generate the standard mildly context sensitive formal languages as well as some context sensitive natural language fragments of Dutch, SwissGerman or Old Georgian. A polynomial recogn...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011